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Abstract 


Some probabilistic illustrations of the reliability coefficient are provided to assist in interpretation 
of this measure. All explanations are derived under the assumption that the joint distribution of 
examinee scores from two parallel tests is well approximated by a bivariate normal distribution. 
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A number of probabilistic interpretations of reliability coefficients are readily available given 
concepts of simple random sampling from a very large population and given a bivariate normal 
approximation for the joint distribution of test scores from two parallel test forms. In this note, 
three such interpretations are provided. The first considers the probability that, if two examinees 
are selected at random and scores on Forms 1 and 2 are recorded, then the same examinee 
obtains the higher score on both forms. The second considers the probability that an examinee 
who has exceeded a cut-point score on Form 1 also exceeds the cut point on Form 2. The third 
interpretation considers the interval width based on Form 1 that suffices to include the examinee 
score on Form 2 with a given probability. 

In each example, it is an elementary matter to provide a table that indicates, for a given 
reliability, what is the corresponding probability or interval width. Thus the tables may have some 
potential for use in educating relatively less technically oriented audiences about the meaning 
of reliability. The mathematical formulas for computation of table entries are not difficult to 
derive, but they are not themselves readily understood by audiences that are not mathematically 
sophisticated. 


1. Constant Order 

Let two examinees numbered 1 and 2 be obtained by simple random sampling without 
replacement from a very large population of examinees. Let examinee k, k equals 1 or 2, have 
score Xjk on form j. Given the assumption of simple random sampling, the joint distribution 
of scores An and X 21 for Examinee 1 is the same as the joint distribution of the scores X \2 
and X 22 for Examinee 2. Given the assumption that the population is very large, scores for 
Examinee 1 can be regarded as independent of scores for Examinee 2. For simplicity, assume that 
the joint distribution of X\^ and X 2 k is well approximated by a bivariate normal distribution. 
The assumption that the forms are parallel implies that and have common mean p and 
common standard deviation <7 for each examinee k. Let the reliability coefficient be p 2 , so that p 2 
is the correlation of Xy, and X 2 k for each examinee k. 

Consider the concordance probability C that the same examinee has the higher score on both 
examinations. Under the bivariate normal approximation, this probability, which is encountered 
in the study of Kendall’s r (Kruskal, 1958), is equal to 0.5 + tt^ 1 sin~ 1 (p 2 ). Table 1 provides a 
table of C and p 2 . 
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Table 1. 

Relationship of Reliability and Concordance 


Reliability 

Concordance 
(form to form) 

Concordance 
(form to true) 

0.0 

0.50 

0.50 

0.1 

0.53 

0.60 

0.2 

0.56 

0.65 

0.3 

0.60 

0.68 

0.4 

0.63 

0.72 

0.5 

0.67 

0.75 

0.6 

0.70 

0.78 

0.7 

0.75 

0.82 

0.8 

0.80 

0.85 

0.9 

0.86 

0.90 


One simple criterion based on the table is the point at which the concordance probability 
has progressed half way from its minimum value of 0.5 to its maximum value of 1. This point is 
reached at p 2 = 1/2 1 / 2 = 0.71. 

Interestingly, the choice of p 2 = 1/2 1 / 2 arises from a very different case. For examinee k, 
consider the best predictor of score X 2 k on Form 2 from score X\f. on Form 1. Under the bivariate 
normal approximation, the mean-squared error of this predictor is <r 2 (l — p 4 ). Without knowledge 
of X]^, <7 2 is the best mean-squared error achievable by prediction of Xo,k by a constant c. If 
p 2 = 1/2 1 / 2 , then the mean-squared error a 2 /2 from optimal prediction of X 2 k from Xu- is half 
the mean-squared error a 2 from optimal prediction of X 2 k by a constant. 

Results in this section are more conservative than those obtained from an interpretation based 
on a comparison of rankings based on true scores and on observed scores. For each examinee k, 
there exists a variable Tk, the true score of examinee k , with mean p such that Xjk = X \ + ep. 
for form j and such that T *., e^. and ejk are uncorrelated for each form j and examinee k. The 
standard deviation of Tk is <r(l — p 2 ) 1 / 2 , the standard deviation of ejk is up, and the correlation of 
Xjk and Tfc is p. Under the normal approximation, the variables T\, T 2 , en, e 2 i, ei 2 , and e 22 are 
mutually independent (Lord & Novick, 1968, chap. 3). 

Under the normal approximation, the concordance probability Ct that the same examinee 
has the higher score on form k and has the higher true score is equal to 0.5 + 7r _1 sin _1 (p). Results 
are listed in Table 1. In this case, the concordance probability of 0.75 is attained for a reliability 
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of only 0.5. Nonetheless, very high concordance probabilities still require quite high reliability, as 
is evident from the probability of 0.90 for the reliability of 0.90. 

2. Cut Points 

As an alternative interpretation, consider use of cut points. Suppose that a score is acceptable 
if it falls above the lOOpth percentile for some p greater than 0 and less than 1. Alternatively, a cut 
point z may have been selected, and p is the probability that a randomly selected examinee scores 
below 2 . Consider the conditional probability that an examinee receives an acceptable score on 
Form 2 given that the examinee receives an acceptable score on Form 1. Let (j) denote the standard 
normal density, let $ denote the standard normal distribution function, and let q = <h _1 (p) be the 
standard normal percentile that corresponds to lOOp. Then elementary arguments may be used to 
show that the normal approximation yields the joint probability 

J = J [i - $([<? - py \/{i - p 2 ) 1/2 )} 2 Hv)dy 

that a randomly selected examinee receives an acceptable score on both Form 1 and Form 2, and 
<7/(1 — p) is the corresponding conditional probability that the examinee receives an acceptable 
score on Form 2 given that an acceptable score is received on Form 1. Results are provided in 
Table 2. 

The table suggests that exceeding a cut point once does not provide much assurance of 
exceeding a cut point again even with rather high reliability. Reliability does matter, for results 
for p 2 = 0.9 are considerably better than for p 2 = 0.8. The greatest challenge is for high cut 
points. Even for a reliability of 0.9, for lOOp = 80, the conditional probability is only 0.75 that the 
cut point on Form 2 is exceeded given that the cut point on Form 1 is exceeded. 

More favorable results are obtained if one considers the probability that the score exceeds 
the cut point given that the true score exceeds the cut point. In this instance, the conditional 
probability sought is 

/»oo 

(1 - p)~ 1 / [1 - &([q ~ py\/{ 1 - p 2 ) l/2 )](t){y)dy. 

Jq 

For some results, see Table 2. Note that high cut points still present challenges, as is evident from 
the case of lOOp = 80 and p 2 = 0.9. 
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Table 2. 

Probabilities of Exceeding Cut Points 


Percentile 

Reliability 

Joint probability 
(two forms) 

Conditional probability 
(Form 2 given Form 1) 

Conditional probability 
(form given true) 

20 

0.0 

0.64 

0.80 

0.80 

40 

0.0 

0.36 

0.60 

0.60 

60 

0.0 

0.16 

0.40 

0.40 

80 

0.0 

0.04 

0.20 

0.20 

20 

0.2 

0.66 

0.82 

0.85 

40 

0.2 

0.39 

0.65 

0.72 

60 

0.2 

0.19 

0.48 

0.58 

80 

0.2 

0.06 

0.28 

0.41 

20 

0.4 

0.68 

0.85 

0.88 

40 

0.4 

0.42 

0.70 

0.77 

60 

0.4 

0.22 

0.56 

0.66 

80 

0.4 

0.08 

0.38 

0.52 

20 

0.6 

0.70 

0.87 

0.91 

40 

0.6 

0.46 

0.76 

0.82 

60 

0.6 

0.26 

0.64 

0.74 

80 

0.6 

0.10 

0.50 

0.62 

20 

0.8 

0.73 

0.91 

0.94 

40 

0.8 

0.50 

0.83 

0.88 

60 

0.8 

0.30 

0.75 

0.82 

80 

0.8 

0.13 

0.65 

0.74 

20 

0.9 

0.75 

0.94 

0.95 

40 

0.9 

0.53 

0.88 

0.92 

60 

0.9 

0.33 

0.83 

0.88 

80 

0.9 

0.15 

0.75 

0.82 


3. Intervals 

Consider use of the score on Form 1 to provide an interval that contains the score X- 2 k 
on Form 2 with a given probability p. Ifz=<l )_1 (l— p/2), then a suitable interval based on 
the normal approximation is centered at p + p 2 (X\ — p) and has width 2zo(l — p 4 )® 2 . Table 3 
provides intervals for p = 0.05, so that the coverage probability is 0.95, and for o = 100, a value 
relatively close to that encountered with the SAT® I math or verbal examination. For narrower 
intervals, the case of p = 0.5 is also considered, so that the coverage probability is 0.5. The 
intervals for p = 0.5 are considerably narrower than for p = 0.05. 

The table suggests that widths are not greatly reduced unless reliability is rather high. The 
width is not halved until p 4 = 0.75, so that p 2 = 0.866. Even for a p 2 of 0.6, the width is 80% of 
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Table 3. 

Widths of lOO(l-p) % Prediction Intervals for Parallel Form Score and True Score 

for Standard Deviation of 100 


Reliability 

Form 
p = 0.05 

score 

p = 0.5 

True 
p = 0.05 

score 

p = 0.5 

0.0 

392 

135 

392 

135 

0.1 

390 

134 

372 

128 

0.2 

384 

132 

351 

121 

0.3 

374 

129 

328 

113 

0.4 

359 

124 

304 

104 

0.5 

339 

117 

277 

95 

0.6 

314 

108 

248 

85 

0.7 

280 

96 

215 

74 

0.8 

235 

81 

175 

60 

0.9 

171 

59 

124 

43 


the width for p 2 = 0. 

Prediction intervals for the true score Tj. that are based on the observed score Xy, are a bit 
smaller. Under the normal approximation, an interval that contains T% with probability 1 — p has 
center p + p(X\ — p) and width 2zo(l — p 2 ) 1 ^ 2 ■ Results can be found in Table 3. Here, relative to 
the interval width for p 2 = 0, the interval width is halved if p 2 = 0.75, and the interval width is 
divided by 3 if p 2 = 0.89. 


4. Conclusions 

The proposed interpretations of reliability can be presented to indicate the consequences of 
reliability coefficients of various values to provide a test user a notion of reasonable expectations 
On the whole, the measures presented appear to suggest relatively high standards for reliability 
coefficients, although different individuals may interpret the numerical results in quite distinct 
ways. 
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